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Interpretations  of  the  Inrerse  relation  betnoen  the  duration  for 
which  a printed  word  must  be  exjriosed  visually  in  order  to  be  recognized 
and  the  frequency  of  occurrence  of  the  word  in  a large  general  word 
count  are  examf.ned  critically.  It  is  found  that  the  most  satisfactory 
interpi^tation  is  to  regard  the  word's  frequency  in  a word  count  as  an 
estimate  of  its  average  probability  of  emS.ssiou  by  the  population  of 
subjects  used  in  tne  reeogrd.tio.s  experiment  (base  probability).  Since 
the  threshold  of  recognition  is  defined  by  the  probability  of  emission 
of  the  word  following  its  exposure,  only  a short  exposure  will  be 
necessary  to  bring  up  to  threshold  criterion  a word  whose  base  probability 
is  almost  as  irreat  aa  the  criterion  probability,  while  a much  longer 
exposure  will  be  required  for  a word  whose  base  probability  is  low. 

The  critical  assumption  of  this  interpretation  is  that  the  frequency  of 
a word  In  a large  general  word-count  represents  its  average  probability 
of  emission  by  the  experimental  subjects  used  in  ths  recognition 
experiment.  Three  experiments  evaluating  the  validity  of  this  assump- 
tion for  the  Lorge  Magazine  Count  are  descrj-bed. 
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ON  THE  INTEKFRSTATIQN  OF  WORD  FREQUENCY 
iS  A VARIABIf:  AFFECTINO  SPEED  CF  REC0C21ITI0N 


IKTR(SXICTICai 

The  dijration  for  irtiich  a printed  English  word  must  be  presented 
visually  to  a subject,  in  order  for  him  to  recognize  it  is  inversely 
correlated  with  the  frequencj*  of  occurrence  of  the  T<ord  in  large  sanples 
of  written  English  (2,  3,  U»  8,  U).  Since  the  formtir  quantity 
(the  duration  threshold)  is  generally  regarded  as  a perceptual  variable 
and  the  latter  (word  frequency)  as  a response  variable,  this  correla- 
tion offers  a point  of  departure  for  the  formulation  of  perceptual 
phenomena  in  behavioral  concepts.  The  object  of  the  preaert  study  is 
to  teat  experimentally  an  assunqstion  basic  to  one  interpre^;ation  of 
this  corr^atlon.  A mathematical  formulation  of  the  experimental  data 
based  upon  this  interpretation  iri.ll  be  presented  in  a subsequent  repc<rt» 

The  interpretation  to  be  considered  here  can  be  characterized  as 
a response-emission  theory.  We  may  think  of  the  momentary  probability 
of  a word  (defined  as  the  strength  of  the  subject's  tendency  to  esiiit 
that  word  in  preference  to  any  other)  a&  a quantity  that  fluctuates 
widely  from  moment  to  moment  in  accordance  trith  changes  in  Inhumerable 
environmental  and  or^nismic  conditions  that  affect  the  emission  of 
words.  Over  a time  period  of  considerable  length  the  average  of  these 
momentary  probabilities  will  be  a relatively  stable  statistic,  which 
we  shall  call  the  base  probability  of  the  word. 

Visvial  exposure  of  a word  to  a subject  for  a brief  length  of  time 
^t  is  assumed  to  represent  an  environmwital  event  tending  to  cause 
^S^ssion  of  the  e]q>osed  word.  The  momentary  probability  of  a word 
following  its  exposure  may  therefore  be  analysed  into  two  conponents* 
a component  due  to  the  ordinary  inpulcss  to  emission  of  the  word, 
whose  average  value  is  the  base  probability;  and  a component  due  to 
the  additional  impulse  of  the  word's  visual  exposure.  Consequently, 
the  average  probability  of  a word  following  each  of  a number  of  expo- 
sures cf  the  same  duration  must  be  greater  than  the  corresponding 
average  base  probability  of  the  word,  A given  level  of  probability 
following  ejqrasure  can  result  either  from  a relatively  large  component 
due  to  base  probability  plus  a duall  additional  component  due  to 
exposure  or  from  « relatively  small  con^onent  due  to  base  probability 
plus  a large  additional  component  d\»  to  exposure.  It  follows  that 
the  diuratlon  threshold  of  a word,  which  is  defined  as  the  duration  of 
ejqposure  for  which  of  the  subject's  reports  fcUowlng  exposure  are 

correct,  will  be  lower  for  a word  with  high  base  probability  than  for 
a word  with  low  base  probablUty. 
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In  this  interpretation  of  the  experimental  data,  -word  frequency- 
serves  as  an  estimate  of  base  probability*  In  the  cited  experiments 
nord  frequency  was  deteirmined  from  the  published  tables  of  the  Lorge 
Magazine  Count  (10)«-k-  This  count  is  based  on  a sample  of  million 
wards  of  text  taken  from  issues  of  five  pop'jlar  magazines  dating  from 
the  period  1928-1939*  Tlie  principal  e^eriment  correlating  duration 
threshold  with  ward  frequency,  however,  -was  carried  out  in  the  Summer 
of  19W  with  Harvard  undergraduates  as  subjects  (3)*  Hence  the  question 
arises!  to  -what  extant  can  word  frequencies  based  on  the  linguistic 
beha-vior  of  magazine  -writers  in  the  1930s  represent  the  a-verage  base 
piobabLli-ties  of  Harvard  students  xn  191(87  We  shall  consider  that 
question  here  as  it  applies  to  the  7^  words  used  in  the  two  main  recog- 
nition experimen-bs  (3,  Table  1 and  p.  Uo6)» 

HtOC£DURK 

To  ascertain  directly  the  degree  of  relationship  between  word  fre- 
quency and  base  probability  -we  would  need  to  correlate  Uagazlne-Count 
frequencies  with  frequencies  obtained  from  a sasple  of  conparfJsle  size 
taken  from  the  -to-tal  linguistic  production  of  Ifeurvard  undergraduates 
in  the  Sumer  of  19U8.  Preparation  of  a count  of  student  languags  on 
such  ?■  scale,  howe-ver,  is  at  present  unfeasible.  Correlation  of  lilaga- 
zimt-Coun-t  frequencies  with  frequenules  based  on  a small  sample  of 
sttxdent  langxiage  would  be  unsatisfactory  because  many  of  the  words 
used  in  the  recognition  e3q)eriaen-ta  have  -very  small  probabilities  that 
coul.d  not  be  estimated  from  samples  of  lass  than  a ndllion  words.  We 
are  -therefore  forced  to  rely  upon  an  Indirect  technique  of  measuring 
the  degree  of  relationship. 

Ttvee  experiments  are  reported  below.  In  each  experiment,  student 
subjects  wez«  asked  -bo  rank  a set  of  words  according  to  the'  frequency 
with  which  those  words  are  used  by  their  college  coionunity.  The 
rank  of  a word,  a-veraged  over  all  subjects,  is  assumed  -to  estimate  the 
rela-bi-ve  base  probability  of  the  ward  for  that  population.  ]bi  other 
words,  it  is  assumed  that  a grovp  of  students  asked  which  of  two  words 
thay  vas  store  fr^uently  will,  more  often  than  not,  choose  the  -word 
that  5ji  actual  fact  occiirs  more  frequently.  These  student  ranto  were 
then  correlated  -with  ranks  for  the  same  -words  baaed  on  the  Uagazine- 
Count  tables  in  order  -to  obtain  an  estimate  of  the  degree  of  correlation 
between  relative  base  probability  and  relative  Magazine-Count  fre- 
quency. Strictly  con^dered,  the  results  of  this  type  of  test  apply 
only  to  ranks,  although  -ws  shall  see  later  that  under  certain  conditions 
they  also  apply  to  the  frequencies  themselves. 


*In  some  of  the  experiments  the  Thorndlks-Lorge  Semantic  Count  -itas  also 
used.  the  correlations  -with  duration  threshold  are  about  the  same 

lather  the  Semantic  Count  or  the  Magazine  Count  is  used  (3);  only  the 
latter  -will  be  considered  here. 
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The  studies  to  be  described  all  depend  on  the  sruue  coBqjutational 
procedures*  To  obtain  data  representing  the  population  of  students 
as  a tihole,  the  ranks  assigned  each  nord  by  the  different  subjects  mre 
totalled  and  a set  of  avers^ge  rariks  con^mted  for  these  totals*  These 
average  ranks  irere  then  coi'related  -nith  Uagazlne-Count*  rank  hj  means 
of  Speanaan's  coefficient^  (6,  p*  106)*  In  order  to  correct  these 
rank-order  coefficients  for  attenuation,  the  reliabilities  of  the  tno 
sets  of  ranks  had  to  be  estimated*  The  reliability  of  the  students' 
estirwtes  nas  determined  by  a conventional  split-half  technique*  The 
subjects  were  divided  randomly  into  two  subgroTips  and  the  nords  ranked 
for  each  subgroup  Just  as  they  vrere  for  the  total  group*  The  relia- 
bility of  the  total  group  ms  then  estimated  from  the  rank-order  cor- 
relation between  the  two  subgroups  by  means  of  the  Spearman-Brovm 
prophecy  formula  (6,  p.  19li)«  The  reliability  of  the  tanks  based  on 
Magazine-Count  frequency  could  not  be  measured  by  a true  split-hajf 
technique,  there  being  no  way  to  divide  t>>e  total  count  randomly  into 
two  subgroups*  But  the  Magazine  Count  records  separately  the  frequency 
of  a word  in  each  of  the  five  different  j^gazines  saa?>led*  Most  of 
the  systematic  differences  between  magazines  can  be  cancelled  out  by 
pooling  three  of  them  (Ths  Saturday  Evening  Post.  The  Woman's  Home 
CoapaMon,  and  The  Reader's  Digest)  into  one  subgroup  of  2*38  million 
words  and  the  othsr  two  (True  Story  and  The  Ladies*  H;me  Journal) 
into  a second  subgrot^)  of  2*21  million  words,”  Rank-order  correlations 
between  the  frequencies  of  the  words  in  theije  two  subgroups  were  used 
to  obtain  an  estimate  of  the  reliability  of  the  Magazine-Count  ranks. 
The  twro  reliability  coefficients  were  then  used  to  correct  for  atten- 
uation the  correlation  between  average  student  ranks  and  Magazine- 
Count  ranks* 


EXrBRIUEKTS 

Experiment  1*  This  experiment  ms  carried  out  in  IJiiS  using  2h 
Harvard  College  students  as  subjects*  It  is  therefore  directly  appli- 
cable to  the  populati<m  used  in  the  experiments  on  duration  threshold* 

The  words  were  the  2$  rarest  ones  in  the  list  of  60  used  in  the  main 
threshold  experiment  (3,  Table  1),  It  has  generally  been  siqjposed 
(e*g*,  l)  that  the  frequencies  of  rare  words  are  r=Qre  apt  to  depend 
on  peculiarities  of  the  sanple  chosen  for  a word  count  than  are  ths 
frequencies  of  coanon  words-  The  correlation  for  the  entire  set  of  60 
words  thus  should  be  at  least  as  high  as  the  correlation  for  the  2$ 
rarest  <»es* 

The  subjects'  instructions  were  to  "rank  the  words  in  order  of  their 
frequency  of  occurrence  among  Harvard  undergraduates*  Frequency  of  usage 
refers  to  occiirrence  in  all  forms  of  language  — speaking  and  listening. 


*The  unpublished  version,  giving  the  frequencies  for  words  iii  their 
fTiUy-inflecte»i  fonas,  was  used*  I wish  to  thank  Dr.  Irving  lorge  for 
permission  to  use  tJils  of  the  count* 
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reading  and  uriting.  A slight  en53hasis  on  speaking  frequencies  is 
probably  justified  since  speech  is  probably  the  most  basic  form  of 
language*  Use  your  oto  verb.al  behavior  as  typical  of  'Harvard 
undergraduates'  but  take  into  account  any  very  atypical  idiosyncracies 
of  your  speech*"  The  subjects  were  also  told  to  consider  the  words  in 
their  specific  graanatical  forms,  not  as  root  words*  Since  the  words 
were  all.  taken  from  the  lowest  frequencies  in  the  Magazine  Count,  there 
were  a nuz^ber  of  cases  in  which  different  words  had  the  sam  Magazine- 
Count  frequency*  Wherever  possible,  such  ties  were  brokem  by  giving 
the  higher  rank  to  the  word  wiiose  frequency  is  higher  in  the  Thomdike- 
Lorge  Semantic  Count  (10)*  This  count  Is  based  on  a sample  of  approx 
inately  the  same  size  as  the  Magazine  Count  but  taken  from  more  literary 
sources,  such  as  thq  Encyclopedia  Britannlca.  the  Id.terary  Digest,  and 
miscellaneous  novels  and  textbooks* 

Table  1 gives  the  Magazine-Count  ranks  and  the  average  student 

TABIE  1 


Words  of  Experlunent  1 listed  in  Order  of 
^gazlne-iCount  Rank  with  Their  Average  Student  Rsnks 

(Brackets  Inclose  Words  with  Identical  Magazine-Count  Frequencies) 


Word 

Rank 

Word 

Rank 

celestial 

n 

erudition 

6*5 

assiduous 

conviviality 

1U*5 

b«r4.gn 

O0P 

etcher 

23 

altruistic 

«. 

psychical 

22 

amioable 

6*5 

inductive 

3 

mundane 

5 

pedagogue 

13 

condolence 

12 

vd,gnette 

20 

metaphor 

1 

theistic 

19 

frugality 

10 

statics 

18 

beatific 

16 

elegies 

17 

barrister 

11**5 

chancels 

25 

rebuttal 

U 

fuiacoerced 

21 

^ercipience 

2U 

ranks  for  the  25  words 

of  E:qjeriB»nt  1* 

The  correlation  between 

them 

is  *71,  corrected  for  attenuation  to  ^78*  These  correlations  ■:ire  well 
above  the  usual  criteria  for  statistical  significance  (in  testing  signi- 
ficance ^ was  Interpreted  as  r)*  The  reliabilities  ars  also  of 
interest  I for  the  students'  rankings  the  reliability  A is  *?885  for 
the  Magazine-Count  ranks,  *835* 

Experiment  2*  This  study,  also  carried  out  with  Harvard  College 
students  in  19);^,  was  designed  to  test  the  upper  limit  of  the  students' 
ability  to  estimate  base  probabilities.  To  this  end  the  15  words  used 
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in  a supplementary  threshold  experiment  (3>  p*  i|06)  Kere  selected,. 
These  words  cover  a large  range  of  Itagazine-Count  frequency  (o  to  l‘>00) 
and  differ  by  approximately  equal  logarithmic  distances  so  that  no  tiro 
words  are  in  the  same  f range*  High  reliability  was  assured 

by  including  only  words  that  have  the  same  rank  in  both  Magazine  and 
Semantic  Coimts,  Fouirteen  subjects  were  used  (no  subject  was  used  in 
both  Eoqjerimaits  1 and  2)*  The  instructions  wex*e  to  rank  the  words 
"on  the  basis  of  how  common  you  think  these  words  are  in  their  usage 
by  college  students*" 

Magazine-Count  ranks  and  average  student  ranks  for  these  words 
appear  in  Table  2*  The  correlation  is  *87>  corrected  for  attenuation 


TABLE  2 

Words  of  Experiment  2 listed  in  Order  of 
Magazine-Count  Rank  with  Their  Average  Student  Hanks 


Word 

Rank 

Word 

Rank 

country 

n r* 

7 

promise 

5 

suns5.se 

11 

eorampie 

1,5 

dwindle 

9 

balance 

3 

irksome 

12 

welfare 

6 

vulture 

13 

venture 

8 

machete 

15 

deserve 

U 

titular 

lU 

figment 

10 

to  *68*  The  reliability  of  the  ?ttudents’  ranks  is  *991;  that  of  the 
Magazine-Count  ranks,  *983* 

Expeiiiwnt  3*  A fxrrther  study  was  carried  out  in  1953  cm  all 
60  words  of  the  main  threshold  e^eriment  using  Antioch  College  students 
as  Subjects.  The  60  worexs  were  divided  into  three  lists  having  approx- 
iBRtely  the  cams  distributions  of  Magazine-Count  frequencies.  Each  list 
was  given  to  10  subjects-  The  instructions  were  to  "rank  the  words 
according  to  the  frequency  with  which  you  think  they  ai-e  used  in  the 
Antioch  Community.  Frequency  of  usage  here  includes  all  occuiTcnces 
of  words,  in  both  written  and  spoken  language,  during  the  present  school 
year  (1952-3 )•  Consider  the  words  exactly  as  they  are  written,  not 
their  roots  or  related  forms," 
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The  reetilte  are  shonn  in  Table  3*  The  corx^atlons  are  e,dl,  .82, 


TABIE  3 

Words  of  E3q>erlment  3 Listed  in  Order  of 
iiagazine-Cottat  Rank  ulth  Their  Average  Student  Ranks 

(Brackets  Inclose  Words  with  Identical  Magazine-Count  Frequencies) 


Part  A Part  B Part  C 


Word 

Rank 

Word 

Rank 

Word 

Rank 

picture 

2 

government 

k 

service 

3 

market 

6 

education 

1.5 

knowledge 

2 

friendly 

1 

sympathy 

1 

automobile 

7.5 

savings 

3 

Scientific 

1.5 

lavyer 

9 

spiritual 

7 

Ipainting 

9 

religious 

5 

hospitality 

9 

orchestra 

10 

churches 

10 

literary 

3 

economics 

6 

heavenly 

■11 

ensemble 

12,5 

poetry 

8 

broker 

21 

assets 

5 

intellectual 

3 

limousine 

17.5 

fcharltable 

lU 

iiil.tiative 

5 

liberties 

1 

\earthly 

benign 

17 

assiduous 

15 

reverence 

12.5 

amicable 

12 

physics 

k 

debating 

h 

mundane 

11 

condolence 

12 

celestial 

n 

ffietaphor 

(Judiciary 

7.5 

altruistic 

10 

beatific 

19  1 

frugality 

13 

^coerced 

15 

pebuttal 

11*  1 

Srudition 

16 

statics 

16 

lljarrister 

20  1 

wtcher 

17.5 

vignette 

18 

conviviality 

18  i 

^ductive 

6 

pedagogue 

19 

Sheistic 

15 

roercipience 

19 

^hancsls 

20 

psychical 

16 

[elegies 

20 

and  .57,  respectively,  for  the  three  parts  of  the  experinent.  Correct&i 
for  atteriuation,  these  become  .85,  cSU,  and  .60.  All  are  statistically 
ireliable.  The  reliabilities  for  the  students'  estimates  are  .950,  .982, 
and  *9U2;  for  the  iiagaziue-^oxint  ranks,  .978,  .950,  and  .952,  respectively. 

Application  U>  relative  base  probabilities.  These  exporiments 
give  rank  congelations  feetweffli  -nord  frequency  aid  student  estimates  of 
relative  base  probability.  They  can  be  interpreted  as  rank  correla- 
ticns  betireen  eord  frequency  and  base  probability  only  on  the  assumption 
triat  the  student  estimates  of  relative  base  probability  were  correct. 

Some  tests  of  that  assun^/tion  are  considered  in  this  section. 
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liV;  BLft^erjjnen'LS  £ and  3 each  subject  ■"as  askecl  to  give  a second  set 
TSTkings  in  fmich  he  ordered  the  words  according  to  their  ■Trenviei^cies 
in  his  owh  personal  usage.  The  exact  instructions  in  Sjqseriraent  2 were 
to  rank  the  words  "on  the  basis  of  your  in»ressicn  of  how  conmon  the 
words  are  to  you  personally* j and  in  Experiment  3,  to  "rank  the  words 
according  to  the  frequency  with  which  you  yourself  use  them."  It  will 
be  convenient  to  refer  to  these  as  personal  rarl^  and  tne  ranks  pre** 
vioualy  described,  where  the  words  were  ranked  according  to  their  use 
by  the  entiare  college  population,  as  college  ranks.  CoB9)ai*i8on  of  the 
two  sets  of  ranks  pomdLt  us  to  test  the  ability  of  the  students  to 
rank  base  probabilities. 

Let  us  observe  how  the  two  sets  of  student  rankings  are  related. 

The  base  probability  of  a mrd,  if  'we  could  measure  it,  would  be  found 
to  vazy  soaeidiat  from  one  student  to  another.  l£  these  personal  base 
probabilities  were  averaged  over  all  students  in  the  defined  student 
population,  the  result  would  give  the  base  probability  of  the  word  for 
the  college  popiilatlon.  The  assumption  that  the  students  were  cozu*ect 
in  their  estimates  of  relative  base  probability  then  Inplies  the 
following  relations! 

(1)  Personal  rt'nks  should  vary  more  from  one  subject  to  another 
than  college  ranks,  since  the  former  reflect  real  differences  in  base 
probability  as  well  as  errors  cf  estimation  while  the  latter  reflect 
errors  of  estimation  only.  Hesnee  the  interpersonal  correlations  for 
personal  ranks  should  be  lowsr  than  the  sorresponulng  interpersonal 
correlations  for  college  z-anks. 

(2)  Personal  ranks  averaged  over  the  ejitire  popxxlation  of  subjects 
should  (when  a sufficiently  la:^ci  aaaple  of  the  popiiation  is  taken) 
equal  the  average  college  ranks.*  From  this  it  follows  that  Magazine- 
Count  ranks  should  correlate  as  highly  with  average  personal  ranks  as 
with  average  college  ranks. 

The  fiirst  deduction  can  be  tested  by  comparing  the  rank  correla- 
tions for  all  possible  combinations  of  subjects  ^ ETperiments  2 and  3. 
For  each  type  of  ranking  there  ai^  a total  of  (^)  = 91  correlations 
in  EmetH«Mmt  2 and  a total  of  iW)  s 1*5  corr&ations  in  each  part 
of  Ej^eriment  3*  Well  over  half  She  correlations  are  higher  for  the 
college  ranks,  the  propKnrtions  being  57/91  in  Ezqperiment  2 «zid  25/U5, 
32/kpy  and  29/liS  for  the  three  pairts  of  Experiment  3*  The  total  pro- 
portion, 11*3/226,  is  four  standaird-error  units  above  brie  ratio  of 
113/226  that  would  be  expected  if  there  were  no  difference  between  the 


^hls  assumes  that  ranks  according  to  fz>equency  can  be  averaged  like 
the  actual  fr«qns:y;ies,  sn  assuaption  that  is  not  generally  valid.  But 
the  ded’uction  here  concerns  only  the  relative  magnitudes  cf  iny  bias 
izktroduced  by  t!»  averaging  of  ranks  should  affect  all^D's  alike,  since 
by  Zipf  *8  law  (12)  the  distributions  of  word  frequency  will  be  similar. 
Hence  the  conclusions  should  not  be  affected  materJally  bv  the  averaging 
of  ranks. 
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two  types  of  correlations.  The  superiority  of  the  correlations  for 
college  ranks  can  also  be  shown  by  ^tests  for  the  differences  between 
mean  correlalions.  The  t*s  are  3,5  for  Experdaent  2 and  3.2,  U*3«  and 
3,8  for  the  three  parts  of  Ej^riment  3,  all  significant  at  the  ,01 
level. 

According  to  the  second  deduction^  the  correlations  between  average 
personal  ranks  and  average  college  ranks  should  approach  unity  after 
correction  for  attenuation.  The  raw  correlations  between  average  ranks 
are  ,96?  for  Experiment  1 and  ,96U,  ,990,  and  ,962  for  the  three  parts 
of  Experiment  5,  Corrected  for  attenuation,  they  become  .978,  ,998, 
1,016,  and  1,006,  respectively.  The  root-mean-square  of  these  coef- 
ficients, ,9996,  is  remarkably  close  to  the  predicted  value  of  1, 

It  remains  to  ses  ^et/ier  the  Magasine-Count  ranks  correlate  as 
highly  with  average  personal,  ranks  as  with  average  college  ranks. 

The  respective  rank  correlations,  both  with  and  without  correction  for 
attenuation,  are  shown  In  Table  U,  The  root-aean-square  correlation 


TABLE  U 

Gwi-.ol-'oi-ns  of  Personal  and  of  College  Ranke 
Ulth  Magaaine-Count  Ranks,  With  and  Without 
Correction  for  Attemv»tion 

Type  of  Correlation 


Experliaent  Uncorrected  Corrected 


Pars 

Goll 

Pers 

Coll 

2 

,89 

• 

CO 

*90 

.88 

3-A 

.7? 

c8l 

.79 

.85 

.83 

.82 

.66 

.3U 

.57 

.62 

.60 

Root-mean-squares 

.73 

.78 

.80 

* 

09 

o 

is  ,778  for  average  personal  ranks,  ,773*  foi'  average  college  ranks. 
After  correction  for  attenuation  the  root-nean-square  correlations  are 
,801  and  ,799*  respectively.  In  both  cases  the  differences  are 
negligible. 
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The  results  show  that  personal  ranks  differ  more  from  subject  to 
subject  than  do  college  ranks,  but  that  these  Individual  differences  in 
personal  ranks  csncsl  out  in  such  a way  as  to  approach  the  college 
ranks  for  groups  of  10  subjects  or  more.  This  is  the  picture  that 
would  be  expected  on  the  assusption  that  the  students’  estimates  of 
relative  base  probability  wore  valid. 

Application  to  base  probablliries . The  data  reported  above  concern 
only  the  order  of  words  with  respect  to  their  frequency  of  occurrence, 
and  the  conclusions  have  been  phrased  accordingly.  It  is  possible  to 
extend  the  conclusions  to  the  actual  frequencies,  however,  without 
vlolstiag  any  of  the  assumptions  underlying  the  statistical  procedure. 

The  coefficient  ^ measuring  the  correlatibn  between  two  sets  of  ranks 
also  measures  the  correlation  between  the  variables  ranked  If  those 
variables  are  distributed  rectangiilarly  (5,  pp.  106f.).  Now  the 
logaT^t.hws  th“  W“saairic-Count  frequencies  of  the  words  used  in  each 
of  the  a'Dove  e^eriments  form  approximately  rectangular  distributions 
(cf„  3,  fig.  l).  The  actual  base  probabilities  of  the  words  in  the 
students'  usage  are  not  known,  so  the  form  of  their  distribution  can- 
not lie  determined  directly.  But  it  is  reasonable  to  suppose  tiiat  their 
dU>i£dbutions  do  not  differ  materially  in  form  from  the  Magazine-Count 
di'.vributions  in  view  of  Zipf 's  evidence  that  the  distribution  of  word 
frequencies  has  the  same  aathematicad  form  for  all  heterogeneous  samples 
of  language  (l2).  The  rarJr  carrcl*rbions  reported  here  may  thus  be 
regarded  aa  close  approximations  the  product-moment  correlations  be- 
tween log  Magazine-Co\mt  fraqaency  ana  xot  lase  nrobablUty. 

On  the  basis  of  Experiments  1 and  3 the  reliability  of  measurements 
of  log  base  probability  for  the  words  of  the  main  threshold  e^qjerlment 
can  be  put  at  about  .75.  The  raw  product-moment  correlation  between 
log  Magazine-Count  frequency  and  mean  duration  threshold  in  that  axperl- 
ESat  is  -•68.  When  the  thresholds  have  been  corrected  for  certain 
stimulus  characteristics  the  raw  correlation  becomes  -.76.  The  relia- 
bility of  mean  duration  threshold  in  that  experiment  is  about  .90. 

Hence  the  correlation  between  duration  threshold  and  log  base  probability, 
corrected  for  attenustic-n,  can  be  estimated  at  about  -.83  without  the 
correction  for  stiioulus  characteristics  or  a1-)out  -.93  with  that  correc- 
tion. These  indicate  a high  degree  of  rectilinear  relationship  between 
the  two  variables. 


DISCUSSIC8? 

Other  interpretations  of  the  correlation  between  word  freqasacy 
and  duration  threshold  have  been  suggested.  There  are  two  main  points 
in  miLch  these  uiffer  from  the  present  interpretation.  Usually  word 
frequency  is  interpreted  as  the  frequency  with  which  a word  has  occurred 
in  the  past  history  of  the  subject  rather  than  an  estimate  of  the  pro- 
bability of  the  word  at  the  time  of  recognition,  and  as  the  frequency 
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utith  irixLeh  th*  si^Jsets  ae«  or  read  the  nord  rather  than  the  frequency 
•with  Tirhich  eiait  it  ‘ss  ’a  respooise  (5,  7,  S,  9,  U)*  Some  of  the 
reaoona  xOi"  rejecting  thsss  of  Tier  :in  fa'ror  of  the  present 

interpretation  are  js^tioned  in  this  section. 

According  -to  the  first  alternate  interpretation,  Magazine-Count 
frequencies  serve  as  estisates  of  the  frequencies  with  -which  -words  have 
occurred  in  th»  past  histories  of  19U8  Harvard  undergraduates.  If 
•we  considered  only  the  students’  linguistic  historJ.e;::  insnediately  prior 
to  the  recognition  esqjeriscnts,  the  error  of  es-tlnation  introduced  by 
differences  between  these  histciies  and  ths  aaterlal  used  for  the 
Magazine-Coimt  -would  be  about  the  saisa  as  the  error  for  the  responss- 
emisslan  interpretation*  But  error  of  a greater  order  cf  magnitude 
must  be  expected  when  the  Magazine  Count,  based  on  adult  language,  is 
used  to  rspresent  the  studen-ts*  language  during  childhood  and  adolescence, 
which  tuake  up  the  ma.jor  part  cf  the^n  to-tal  linguistic  histories.  It 
is  net  quite  clear,  moreover,  ,1tjst  how  a word’s  frequency  of  pre-vious 
occurrence  could  have  an  appreclabl.i2  effect  upon  its  threshold.  Almost 
all  studies  indicate  that  learning  reaches  an  as2pq}tote  as  a function 
of  practice.  Thus  repetition  of  an  e^Tent  af-tsr  a very  largs  naiiiber  of 
previous  rapetitlons  lias  no  appreciable  effect  on  beha-vior.  E-ven  rare 
woifda  with  frequencies  of  5 or  10  in  the  Magazine  Count  must  have 
occurred  often  enough  among  the  total  inroduction  of  -words  in  a student's 
life  his-tory  -to  have  reached  -tne  asyn?»i‘.ote  of  learning.  Hence  the 
observed  differences  in  diiratlou  threstiold  Ijct-woen  words  cf  these  low 
frequencies  and  -words  of  much  higher  frequencies  can  hardly  be  attri- 
buted to  differences  in  the  number  of  t.lmes  the  -words  isve  occurred  in 
the  subjects’  pasts. 

The  second  point  to  be  considered  is  w'rcsther  Magazine^oont  z^- 
quenc/  should  be  inteonpreted  as  an  estimate  of  the  frequency  with  -which 
the  sxibjeets  read  a njord  or  the  frequency  with  which  they  emit  it. 

Both  the  language  read  and  the  language  emitted  by  Harvard  undergra- 
dnateii  can  be  expected  to  differ  significantly  :Trom  the  material  of  the 
Magazine  Count.  The  reading-frequency  inteiTpretation  thus  in-vol-ves 
Vita  same  kind  of  error  that  ia  estimated  in  the  ;>*avious  seutior  for 
the  response-freqnercy  interpi-etation.  But  additional  sources  oi  error 
will  affect  the  estimation  of  reading  frequencies.  In  normal  reading 
a person  often  skips  connective  -vrords  or  words  at  the  ends  of  lines 
imen  the  meanings  are  indicated  hj  conbext,  and  a difficult  passage  or 
unfamiliar  word,  on  the  other  iand,  may  be  read  o-ver  and  over.  Two 
words  occurring  -with  the  same  fraqpjcfncy  in  a sample  of  reading  material, 
consequently,  may  differ  considerably  with  respect  to  the  nuntoer  of 
times  they  are  actually  read.  Furtheraors,  it  is  difficult  to  specify 
ths  fjroquency  with  which  a word  is  reiid  except  in  terms  of  the  time 
spcaat  in  re<?ding  it,  since  there  is  -uo  simple  observable  response  that 
can  serve  as  a criterion  of  reading.  But  the  amount  of  time  spent  in 
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reading  a word  iriLl  be  affected  b7  many  factors  other  than  the  ntunber  of 
times  it  appears  in  the  material  that  is  read.  Hence  the  Magazine 
Count  will  give  a far  more  erroneous  estimate  of  the  frequency  with 
which  a word  is  read  by  Harvard  undergraduates  than  of  the  frequency 
with  which  they  emit  it. 

Some  recent  experimental  evidence  also  counters  the  reading-fre- 
quency interpretation.  The  fi^quency  with  which  students  actually 
read  a word  is  better  estimated  from  the  frequency  of  the  constituent 
sequence  of  let.tars,  coiqmted  without  regard  to  their  occurrence  as 
* ccspletd  nordj  than  from  the  frequency  of  the  coaplete  word. 
Experiment,  however,  shows  no  correlation  at  all  between  letter-sequenca 
frequencies  and  duration  threshold  (?,  p.  76).  While  it  can  be  argu«Ki 
that  these  results  ax*e  not  conclusive,  since  the  letter-sequence  fre- 
quencies were  baaed  on  a Each  smaller  sajsple  than  the  word  frequencies, 
the  cojjplete  lack  of  correlation  strongly  suggests  that  the  frequency 
with  iriilch  A word  is  read  cannot  account  for  the  correlations  of  .6 
to  .7  that  have  been  found  for  word  frequency  and  duration  threshold. 

SmaiART 

An  interpretation  of  tho  inverse  relationship  between  the  duration 
threshold  of  a word  and  its  frequency  of  o jcurrence  is  outlined. 
According  to  this  interpretation,  the  frequency  of  a word  in  the 
Thomdike-lorge  tables  (lO)  serves  as  an  estimate  of  the  frequency 
with  which  college  students  wotild  have  used  that  word  at  the  time  the 
durwtior.  thresholds  wore  measured  if  the  measnrsj^snts  had  not  been 
made.  The  validity  of  this  estimate  is  tested  by  three  experiments 
based  on  » rank-correlation  proceduz^.  Additional  experiments  provide 
a check  on  the  method.  The  results  indicate  a validity  of  about  .7? 
for  the  STibJects  used,  in  the  principal  ezqperiment  on  d\iration  tlireshold. 
Some  reas<»i8  for  preferring  the  proposed  interpretation  to  others 
that  have  been  siiggested  are  briefly  mentioned. 
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